N-gram Language Modeling of Japanese Using Prosodic Boundaries

نویسندگان

  • Keikichi Hirose
  • Nobuaki Minematsu
  • Makoto Terao
چکیده

A new method was developed to include prosodic boundary information into statistical language modeling. This method is based on counting word transitions separately for the cases crossing accent phrase boundaries and not crossing them. Since direct calculation of the above two types of word transitions requires a large speech corpus which is practically impossible to make, bi-gram counts of part-of-speech (POS) transitions were first calculated for a small speech corpus separately for the two cases instead. Then, word bi-gram counts calculated for a largescale text corpus were divided into the two cases according to the POS transition feature, and finally, two types of word bigram models, one crossing accent phrase boundaries and the other not, were obtained. The method was evaluated through perplexity reduction by the proposed models from the baseline models. When correct boundary position was used, the reduction reached 11%, and when boundaries were extracted using our formerly developed method based on mora-F0 transition modeling, it was 8%. The reduction around 6% was still observed for speech uttered by a speaker different from the one for the corpus used to calculate the POS bi-gram counts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous Speech Recognition of Japanese Using Prosodic Word Boundaries Detected by Mora Transition Modeling of Fundamental Frequency Contours

An HMM-based method of detecting prosodic word boundaries was developed for Japanese continuous speech and was successfully integrated into a mora-basis continuous speech recognition system with two stages operating without and with prosodic information. The method is based on modeling the fundamental frequency (F0) contour of input speech as transitions of mora-unit F0 contours and operates af...

متن کامل

N-gram language modeling of Japanese using bunsetsu boundaries

A new scheme of N-gram language modeling was proposed for Japanese, where word N-grams were calculated separately for the two cases: crossing and not crossing bunsetsu boundaries. Here, bunsetsu is a basic grammatical (and pronunciation) unit of Japanese. A similar scheme using accent phrase boundaries instead of bunsetsu boundaries has already been proposed by the authors with a certain succes...

متن کامل

The role of prosodic boundaries in word discovery: Evidence from a computational model.

This study aims to quantify the role of prosodic boundaries in early language acquisition using a computational modeling approach. A spoken term discovery system that models early word learning was used with and without a prosodic component on speech corpora of English, Spanish, and Japanese. The results showed that prosodic information induces a consistent improvement both in the alignment of ...

متن کامل

Effects of prosodic boundaries on ambiguous syntactic clause boundaries in Japanese

We report the results of experiments designed to investigate the effects of prosodic boundaries on resolving ambiguous syntactic clause boundaries in Japanese. The head-final, prodrop nature of this language generates abundant syntactic attachment ambiguity for sentences that contain relative clauses. Two types of sentences with differing head nouns modified by relative clauses were examined. S...

متن کامل

Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling

This paper proposes an automatic prosodic labeling technique for constructing speech database used for speech synthesis. In the corpus-based Japanese speech synthesis, it is essential to use annotated speech data with prosodic information such as phrase boundaries and accent types. However, manual annotation is generally time-consuming and expensive. To overcome this problem, we propose an esti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002